**Question #1**

Let consider the GPGPUs by NVIDIA you are requested to

* Define the characteristics of these devices in terms of performed functions, highlighting the differences with respect to CPUs
* List the main characteristics of their architecture
* Describe the programming model of GPGPUs.

You are requested to answer using the space available in this page.**Question # 2**

Let consider a superscalar MIPS64 architecture implementing dynamic scheduling, speculation and multiple issue and composed of the following units:

* An issue unit able to process 2 instructions per clock period; in the case of a branch instruction only one instruction is issued per clock period
* A commit unit able to process 2 instructions per clock period
* The following functional units (for each unit the number of clock periods to complete one instruction is reported):
  + 1 unit for memory access:1 clock period
  + 1 unit for integer arithmetic instructions: 1 clock period
  + 1 unit for branch instructions: 1 clock period
  + 1 unit for FP multiplication (pipelined): 8 clock periods
  + 1 unit for FP division (unpipelined): 8 clock periods
  + 1 unit for other FP instructions (pipelined): 2 clock periods
* 2 Common Data Busses.

Let also assume that

* Branch predictions are always correct
* All memory accesses never trigger a cache miss.

You should use the following table to describe the behavior of the processor during the execution of the first 2 iterations of the cycle, computing the total number of required clock cycles.

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| # iteration |  | Issue | EXE | MEM | CDB x2 | COMMIT x2 |
| 1 | l.d f1,v1(r1) |  |  |  |  |  |
| 1 | l.d f2,v2(r1) |  |  |  |  |  |
| 1 | l.d f3,v3(r1) |  |  |  |  |  |
| 1 | l.d f4,v4(r1) |  |  |  |  |  |
| 1 | mul.d f5, f1, f2 |  |  |  |  |  |
| 1 | mul.d f6, f3, f4 |  |  |  |  |  |
| 1 | div.d f7,f5,f6 |  |  |  |  |  |
| 1 | s.d f7,v5(r1) |  |  |  |  |  |
| 1 | daddui r1,r1,8 |  |  |  |  |  |
| 1 | daddi r2,r2,-1 |  |  |  |  |  |
| 1 | bnez r2,loop |  |  |  |  |  |
| 2 | l.d f1,v1(r1) |  |  |  |  |  |
| 2 | l.d f2,v2(r1) |  |  |  |  |  |
| 2 | l.d f3,v3(r1) |  |  |  |  |  |
| 2 | l.d f4,v4(r1) |  |  |  |  |  |
| 2 | mul.d f5, f1, f2 |  |  |  |  |  |
| 2 | mul.d f6, f3, f4 |  |  |  |  |  |
| 2 | div.d f7,f5,f6 |  |  |  |  |  |
| 2 | s.d f7,v5(r1) |  |  |  |  |  |
| 2 | daddui r1,r1,8 |  |  |  |  |  |
| 2 | daddi r2,r2,-1 |  |  |  |  |  |
| 2 | bnez r2,loop |  |  |  |  |  |